79 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
English Finnish French German Russian Swedish
Availability:
Freely Available
License:
CC-BY-NC
Size:
2 GByte Production Status:
Existing-used
Use:
Textual Entailment and Paraphrasing
-
Paper title:Paraphrase Generation and Evaluation on Colloquial-Style Sentences
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eetu Sjöblom | Opusparcus | /N |
Documentation:
Documentation publicly available in English at the URL entered above.
Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English German Swedish
Availability:
Part freely available, part through search interface
License:
mixed CC and "for research purposes after registration"
Size:
None tokens Production Status:
Newly created-in progress
Use:
historical linguistic research
-
Paper title:The EDGeS Diachronic Bible Corpus
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gerlof Bouma | EDGeS Diachronic Bible Corpus | /N |
Documentation:
https://spraakbanken.gu.se/en/projects/complex-verb-constructions
Written
Corpus,
Language Type:
Multilingual
Languages:
German Hindi Italian Spanish Swedish
Availability:
Freely Available
License:
OpenSource
Size:
184880 sentences Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Semi-Supervised Dependency Parsing with Arc-Factored Variational Autoencoding
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ge Wang | Universal Dependencies | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Swedish
Availability:
Discussions are ongoing with the university lawyers about access to the resource. It contains personal information and access will be restricted one way or another, but the details are not clear yet.
License:
Size:
still in progress Production Status:
Newly created-in progress
Use:
Intended use is for research on second language acquisition, with plenty of different tasks: pseudonymization, error correction, text classification by complexity/level/grade/genre/topic, language identification, etc-etc.
-
Paper title:Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elena Volodina | SweLL learner corpus | /N |
Documentation:
Documentation is continuously updated at this page: https://spraakbanken.gu.se/en/projects/swell
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
7 GByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Extended CLEF eHealth 2013-2015 IR Test Collection | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Hungarian Polish Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
2 MByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Khresmoi Summary Translation Test Data 2.0 | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Bulgarian Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hungarian Icelandic Irish Italian Latvian Lithuanian Maltese Polish Portuguese Romanian Slovak Slovenian Spanish Swedish
Availability:
Freely Available
License:
CC-0
Size:
341856530 sentences Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:ParaCrawl: Web-Scale Acquisition of Parallel Corpora
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Philipp Koehn | ParaCrawl | /N |
Documentation:
NoneLanguage Type:
Multilingual
Languages:
Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
100000 tokens Production Status:
Newly created-in progress
Use:
Word Sense Disambiguation
-
Paper title:A Multi-domain Corpus of Swedish Word Sense Annotation
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Richard Johansson | University of Gothenburg | SE |
| Author 2 | Yvonne Adesam | University of Gothenburg | SE |
| Author 3 | Gerlof Bouma | University of Gothenburg | SE |
| Author 4 | Karin Hedberg | University of Gothenburg | SE |
| Main Contact | Richard Johansson | University of Gothenburg | None |
Documentation:
<Not Specified>
Written
Lexicon,
Language Type:
Multilingual
Languages:
Swedish
Availability:
<Not Specified>
License:
CC-BY
Size:
69700 words Production Status:
Newly created-finished
Use:
Opinion Mining/Sentiment Analysis
-
Paper title:SenSALDO: Creating a Sentiment Lexicon for Swedish
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Jacobo Rouces | University of Gothenburg | SE |
| Author 2 | Nina Tahmasebi | University of Gothenburg | SE |
| Author 3 | Lars Borin | Språkbanken, University of Gothenburg | SE |
| Author 4 | Stian Rødven Eide | University of Gothenburg | SE |
| Main Contact | Jacobo Rouces | University of Gothenburg | None |
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Multilingual
Languages:
Ancient Greek Arabic Chinese English Finnish Hebrew Korean Russian Swedish
Availability:
Freely Available
License:
CreativeCommons, Gnu
Size:
11814230 tokens Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers
-
Paper track:Long/Tagging, Chunking, Syntax and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Agnieszka Falenska | Universal Dependencies 2.0 | /N |
Documentation:
https://universaldependencies.org/v2/




